High Performance Reconfigurable Computing for Cholesky Decomposition

نویسندگان

  • Depeng Yang
  • Gregory D. Peterson
  • Husheng Li
چکیده

This paper proposes a hardware accelerator for Cholesky decomposition on FPGAs by designing a single triangular linear equation solver. Good performance is achieved by reordering the computation of Cholesky factorization algorithms and thus alleviating the data dependency. The dedicated hardware architecture for solving triangular linear equations is designed and implemented for different accuracy requirements using customized precisions. Compared to the software on the Intel Xeon quad core microprocessor, our design achieves a speedup of 7~13.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Reconfigurable Processing Element for Cholesky Decomposition and Matrix Inversion

Fixed-point simulation results are used for the performance measure of inverting matrices by Cholesky decomposition. The fixed-point Cholesky decomposition algorithm is implemented using a fixed-point reconfigurable processing element. The reconfigurable processing element provides all mathematical operations required by Cholesky decomposition. The fixed-point word length analysis is based on s...

متن کامل

A Reconfigurable Processing Element Implementation for Matrix Inversion Using Cholesky Decomposition

Fixed-point simulation results are used for the performance measure of inverting matrices using a reconfigurable processing element. Matrices are inverted using the Cholesky decomposition algorithm. The reconfigurable processing element is capable of all required mathematical operations. The fixed-point word length analysis is based on simulations of different condition numbers and different ma...

متن کامل

Variable Precision Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms

Field Programmable Gate Arrays (FPGAs) are frequently used to accelerate signal and image processing algorithms due to their flexibility, relatively low cost, high performance and fast time to market. For those applications where the data has large dynamic range, floating-point arithmetic is desirable due to the inherent limitations of fixed-point arithmetic. Moreover, optimal reconfigurable ha...

متن کامل

Experiments with Cholesky Factorization on Clusters of SMPs

Cholesky factorization of large dense matrices is an integral part of many applications in science and engineering. In this paper we report on experiments with different parallel versions of Cholesky factorization on modern high-performance computing architectures. For the parallelization of Cholesky factorization we utilized various standard linear algebra software packages and present perform...

متن کامل

Two Preconditioners For Voxel μFEM Simulation

Two parallel iterative solvers for large-scale linear systems related to μFEM simulation of human bones were developed. The considered benchmark problems represent the strongly heterogeneous structure of real bone specimens. The voxel data are obtained by a high resolution computer tomography. Non-conforming Rannacher-Turek finite elements are used for discretization of the considered problem o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009